22 research outputs found

    Few-shot linguistic grounding of visual attributes and relations using gaussian kernels

    Get PDF
    Understanding complex visual scenes is one of fundamental problems in computer vision, but learning in this domain is challenging due to the inherent richness of the visual world and the vast number of possible scene configurations. Current state of the art approaches to scene understanding often employ deep networks which require large and densely annotated datasets. This goes against the seemingly intuitive learning abilities of humans and our ability to generalise from few examples to unseen situations. In this paper, we propose a unified framework for learning visual representation of words denoting attributes such as “blue” and relations such as “left of” based on Gaussian models operating in a simple, unified feature space. The strength of our model is that it only requires a small number of weak annotations and is able to generalize easily to unseen situations such as recognizing object relations in unusual configurations. We demonstrate the effectiveness of our model on the pr edicate detection task. Our model is able to outperform the state of the art on this task in both the normal and zero-shot scenarios, while training on a dataset an order of magnitude smaller. (Less)Publisher PD

    Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects

    Get PDF
    Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe

    Visualization as Intermediate Representations (VLAIR) for human activity recognition

    Get PDF
    Ambient, binary, event-driven sensor data is useful for many human activity recognition applications such as smart homes and ambient-assisted living. These sensors are privacy-preserving, unobtrusive, inexpensive and easy to deploy in scenarios that require detection of simple activities such as going to sleep, and leaving the house. However, classification performance is still a challenge, especially when multiple people share the same space or when different activities take place in the same areas. To improve classification performance we develop what we call a Visualization as Intermediate Representations (VLAIR) approach. The main idea is to re-represent the data as visualizations (generated pixel images) in a similar way as how visualizations are created for humans to analyze and communicate data. Then we can feed these images to a convolutional neural network whose strength resides in extracting effective visual features. We have tested five variants (mappings) of the VLAIR approach and compared them to a collection of classifiers commonly used in classic human activity recognition. The best of the VLAIR approaches outperforms the best baseline, with strong advantage in recognising less frequent activities and distinguishing users and activities in common areas. We conclude the paper with a discussion on why and how VLAIR can be useful in human activity recognition scenarios and beyond.Postprin

    Interpretable feature maps for robot attention

    Get PDF
    Attention is crucial for autonomous agents interacting with complex environments. In a real scenario, our expectations drive attention, as we look for crucial objects to complete our understanding of the scene. But most visual attention models to date are designed to drive attention in a bottom-up fashion, without context, and the features they use are not always suitable for driving top-down attention. In this paper, we present an attentional mechanism based on semantically meaningful, interpretable features. We show how to generate a low-level semantic representation of the scene in real time, which can be used to search for objects based on specific features such as colour, shape, orientation, speed, and texture.Postprin

    BIMP: A real-time biological model of multi-scale keypoint detection in V1

    Get PDF
    We present an improved, biologically inspired and multiscale keypoint operator. Models of single- and double-stopped hypercomplex cells in area V1 of the mammalian visual cortex are used to detect stable points of high complexity at multiple scales. Keypoints represent line and edge crossings, junctions and terminations at fine scales, and blobs at coarse scales. They are detected by applying first and second derivatives to responses of complex cells in combination with two inhibition schemes to suppress responses along lines and edges. A number of optimisations make our new algorithm much faster than previous biologically inspired models, achieving real-time performance on modern GPUs and competitive speeds on CPUs. In this paper we show that the keypoints exhibit state-of-the-art repeatability in standardised benchmarks, often yielding best-in-class performance. This makes them interesting both in biological models and as a useful detector in practice. We also show that keypoints can be used as a data selection step, significantly reducing the complexity in state-of-the-art object categorisation. (C) 2014 Elsevier B.V. All rights reserved

    A parametric spectral model for texture-based salience

    Get PDF
    We present a novel saliency mechanism based on texture. Local texture at each pixel is characterised by the 2D spectrum obtained from oriented Gabor filters. We then apply a parametric model and describe the texture at each pixel by a combination of two 1D Gaussian approximations. This results in a simple model which consists of only four parameters. These four parameters are then used as feature channels and standard Difference-of-Gaussian blob detection is applied in order to detect salient areas in the image, similar to the Itti and Koch model. Finally, a diffusion process is used to sharpen the resulting regions. Evaluation on a large saliency dataset shows a significant improvement of our method over the baseline Itti and Koch model.Postprin

    Re-identification of individuals from images using spot constellations : a case study in Arctic charr (Salvelinus alpinus)

    Get PDF
    The long-term monitoring of Arctic charr in lava caves is funded by the Icelandic Research Fund, RANNÍS (research grant nos. 120227 and 162893). E.A.M. was supported by the Icelandic Research Fund, RANNÍS (grant no. 162893) and NERC research grant awarded to M.B.M. (grant no. NE/R011109/1). M.B.M. was supported by a University Research Fellowship from the Royal Society (London). C.A.L. and B.K.K. were supported by Hólar University, Iceland. The Titan Xp GPU used for this research was donated to K.T. by the NVIDIA Corporation.The ability to re-identify individuals is fundamental to the individual-based studies that are required to estimate many important ecological and evolutionary parameters in wild populations. Traditional methods of marking individuals and tracking them through time can be invasive and imperfect, which can affect these estimates and create uncertainties for population management. Here we present a photographic re-identification method that uses spot constellations in images to match specimens through time. Photographs of Arctic charr (Salvelinus alpinus) were used as a case study. Classical computer vision techniques were compared with new deep-learning techniques for masks and spot extraction. We found that a U-Net approach trained on a small set of human-annotated photographs performed substantially better than a baseline feature engineering approach. For matching the spot constellations, two algorithms were adapted, and, depending on whether a fully or semi-automated set-up is preferred, we show how either one or a combination of these algorithms can be implemented. Within our case study, our pipeline both successfully identified unmarked individuals from photographs alone and re-identified individuals that had lost tags, resulting in an approximately 4 our multi-step pipeline involves little human supervision and could be applied to many organisms.Publisher PDFPeer reviewe

    Biologically inspired vision for human-robot interaction

    Get PDF
    Human-robot interaction is an interdisciplinary research area that is becoming more and more relevant as robots start to enter our homes, workplaces, schools, etc. In order to navigate safely among us, robots must be able to understand human behavior, to communicate, and to interpret instructions from humans, either by recognizing their speech or by understanding their body movements and gestures. We present a biologically inspired vision system for human-robot interaction which integrates several components: visual saliency, stereo vision, face and hand detection and gesture recognition. Visual saliency is computed using color, motion and disparity. Both the stereo vision and gesture recognition components are based on keypoints coded by means of cortical V1 simple, complex and end-stopped cells. Hand and face detection is achieved by using a linear SVM classifier. The system was tested on a child-sized robot.Postprin

    Processbeskrivning för familjefokuserat CABLE-arbete

    Get PDF
    Detta examensarbete Àr en del av projektet CABLE (Community Action Based Learning for Empowerment) som Àr ett samarbete med det treÄriga projektet DÀr nöden Àr störst diakoni pÄ svenska i vilket Yrkeshögskolan Novia, de svenska församlingarna i Helsingfors samt Diakonissanstalten deltar. MÄlsÀttningen med detta arbete Àr att skapa en produkt i form av en processbeskrivning som kan vara ett stöd för diakoniarbetare vid uppstart, planering, genomförande, utvÀrdering och uppföljning av CABLE grupper efter att projektet tar slut i december 2018. Processbeskrivningen antar ett familjeperspektiv med fokus pÄ familjer med barn i Äldern 0 till 3 Är. Familjeperspektivet utgÄr frÄn anknytningsteorin och Bronfenbrenners bioekologiska systemteori. I utformningen av processbeskrivningen anvÀndes litteraturstudier och tvÄ semistrukturerade intervjuer. Arbetet resulterade i en processbeskrivning innehÄllande tre faser: planeringsfas, genomförandefas och uppföljningsfas. Faserna Àr i matrisform och utgÄr frÄn frÄgeorden Vem, Vad, Varför och NÀr. FrÄgeorden relateras till olika kategorier som Àr relevanta för respektive fas. Processbeskrivningen synliggör ocksÄ aspekter av arbetets teoretiska perspektiv. Matrisutformningen ger processbeskrivningen flexibilitet och möjliggör för församlingarna att pÄ ett enkelt sÀtt anpassa den till vilken CABLE verksamhet som helst, oberoende av mÄlgrupp.TÀmÀ opinnÀytetyö on osa kolmevuotista CABLE (Community Action Based Learning for Empowerment) projektia, joka liittyy Ammattikorkeakoulu Novian, Helsingin ruotsinkielisten seurakuntien ja Helsingin Diakonissalaitoksen yhteistyöhankkeeseen DÀr nöden Àr störst diakoni pÄ svenska. (SiellÀ missÀ hÀtÀ on suurin diakoniatyötÀ ruotsiksi). Toiminnallisen opinnÀytetyön tavoitteena on kehittÀÀ prosessikuvaus, joka helpottaa diakoniatyöntekijöiden arkea CABLE ryhmÀtoiminnan kokonaisvaltaisessa toteuttamisessa projektin pÀÀtyttyÀ joulukuussa 2018. Prosessikuvauksessa huomioidaan ryhmÀn perustamiseen, suunnitteluun, toimintaan, arviointiin ja seurantaan liittyviÀ kÀytÀnnön toimenpiteitÀ. Prosessikuvaus sisÀltÀÀ perhekeskeisen nÀkökulman, joka painottuu erityisesti perheisiin, joissa on 0 viiva 3 vuotiaita lapsia. Teoreettinen viitekehys rakentuu kiintymyssuhdeteorian sekÀ Bronfenbrennerin bioekologisen systeemiteorian ympÀrille. Prosessikuvausta varten tehtiin kaksi puolistrukturoitua haastattelua sekÀ hyödynnettiin viitekehykseen liittyvÀÀ kirjallisuutta. Prosessikuvaus on jaettu kolmeen toisiaan seuraavaan vaiheeseen; suunnitteluvaihe, toteutusvaihe ja seurantavaihe. Vaiheet esitetÀÀn matriisimuodossa ja kysymyssanat: kuka, mitÀ, miksi ja koska toistuvat jokaisessa vaiheessa. Matriisirakenteen ansiosta prosessikuvaus on joustava. Perhekeskeisyys nÀkyy jokaisessa vaiheessa, mutta seurakunnat voivat muokata prosessikuvausta tarpeidensa mukaisesti CABLE kohderyhmÀstÀ riippumatta.This bachelorŽs thesis forms part of the project CABLE (Community Action based Learning for Empowerment), in co operation with a major project called DÀr nöden Àr störst diakoni pÄ svenska (Where the need is the greatest diaconia in Swedish). The project is held jointly between Novia University of Applied Sciences, the Swedish speaking congregations in the Helsinki region and the Helsinki Deaconess Institute. The purpose of this study is to develop a process description that will support the diaconia personnel in implementing CABLE as a method. The process description consists of practical actions related to establishing, planning, carrying through, evaluating and following-up the CABLE groups. The process description contains a family oriented perspective highlighting families with children between the ages of 0 to 3 years in particular. The theoretical frame is based on attachment theory and BronfenbrennerŽs bioecological system theory. The process description is based on literature studies and interviews. The process description has been divided into three phases, which follow each other. The phases are: the planning phase, the activity phase and the evaluation phase. The phases are formed as matrixes. Every single matrix is constructed from the interrogatives: Who, What, Why and When. The process description is extremely flexible due to the matrix structure. The family oriented perspective appears in every phase, but it is possible for the congregations to modify the process description so that it meets their own needs
    corecore